A Formal Model of Non-determinate Dataflow Computation 


by 
Jarvis Dean Brock 


A. B., Duke University 
1974 


S. M., E. E., Massachusetts Institute of Technology 
1979 


Submitted to the Department of 
Electrical Engineering and Computer Science 
in Partial Fulfillment of the Requirements for 
the Degree of 
Doctor of Philosophy 
at the 
Massachusetts Institute of Technology 
August, 1983 


© Massachusetts Institute of Technology, 1983 


Signature of Author 
Department of Electrical Engineering and Computer Science 
August 23, 1983 
Certified by 
Jack B. Dennis 
Thesis Supervisor 


Accepted by : . 
Arthur C. Smith 
Chairman, Departmental Graduate Committee 


Table of Contents 


Acknowledgment ............ccccssecesenuencueneaese renee a ucestoetenaetaecuhwads 3 
Te INTTOGUCHION cxsscsssceseaveiaea iwanvinetacciadeamendaivineses siiveWutionwipoava¥eececen 5 
2. Dataflow Graphs and Languages ..............+ kawind Guavaguba’ ese Kesendas 11 
2.1. Dataflow Streams ic) tiedstisedan teins Qatispiaiomesbaannascsnietinieiasiaas 13 

2.2 ANon-determinate Dataflow Operator oo... ccssssssscsscscssesseeseceseseccsesreevensensenaes 18 

2.3 Operational SEMANtics .......ccsccesssessscccssssessnscstesssesseesesensssnessnesesseasessesssseccsasens 17 

3. Scenarios: A Model of Non-determinate Computation ............ 19 
_ 3.1 Fixed Point Semantics for Determinate N@tworkS. .........cccsssessseeesserserseeseees 21 

3.2 Fixed Point Semantics for Non—determinate Computation ..........cssssesesenes 24 

3.3 The Incompleteness of History Relations ............:cccssssesssseessessssssessesssssnesoetes 33 

3.4 Scenarios: An Informal Introduction .....c.ccccscsssssecssesssssessssesesesseessesseneeses 41 

4. The Dataflow Graph Algebra ...... bualnevncstenagbaccncaaswsdesaaieenntwauae 49 
5. The Scenario Set Algebra ..........sscsssessseceesenes faleiceuacacuus spac tees 54 
5.1 Operators of the Scenario Set AIQebra ......csssssssscssesscseccetsnsessrecensssenenseesenenes 56 

5.2 Generators of the Scenario Set Algebra .......sssscscsssssssscensesssessenseesseessseasenees 62 

5.3 Operational Consistency .........cscessesssevecsessesssssrssnssceseecsssessessasssessearssantonersssases 64 

6. Conclusion ................ see cteadugewmesnccs cus cassustwocestenasuscreceesareeenne 69 
References SSRSSAASHEKEHSSAAHHSKREOATESHRTSSTAHETESC EHH ASeSES RASC eneteseseneteezcaeesoneese ascuaues 75 
Biographic Note .......cccscsccccesesssssecsenssscnsecssssenscuesensccrarstavsenaeenss 79 


210% 


prove that scenario sets are an abstract representation of non-determinate dataflow networks. 


In the conclusion of this thesis, we discuss Pratt's [37] recent generalization of the scenario 
model and also enter the ongoing fairness "controversy" by observing the apparent immunity of 
the scenario composition rules to the "problem" of the fair merge. Finally, some open problems 


for future research are stated. 
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Figure 2.1.. An example dataflow graph 


sqrt( (x9-%4)? + o-¥4)?) 


through the graph. 


Many high-level, algorithmic languages [2, 5, 12, 29] have been designed for the specific 
task of programming dataflow computers. These languages are quite conventional in appearance 
and would not frighten the average programmer. Perhaps the most "radical" action of dataflow 
language designers has been the banishment of side effects, an action increasingly appreciated 


for its role in simplifying programming language semantics [42]. Readers interested in learning 
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Dataflow graphs exhibiting history-sensitive behavior may be constructed by joining stream 
operators together with feedback, that is, in cycles. In Figure 2.2,a graph which receives an input 
stream and produces an output stream whose n’th value is the sum of the first n input values is 
drawn. Interconnections with feedback can be adapted to a wide range of history-sensitive 
computations, including those as complicated as the interaction of a computer system with 


terminal input and output streams. 


Although we have only discussed how streams are used in dataflow languages, 
conventional programs with input/output primitives causing side effects can also be considered 
to generate streams. The first use of streams for parallel computation was Kahn’s [22] "simple 
language for parallel programming." In this language, processes were written in an Algol-like 
language with two primitives, get and put, for reading input from and writing output to process 


channels. The cons stream operator may be written in language very similar to Kahn’s as: 


Figure 2.2. A history-sensitive dataflow graph 
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The operational semantics of dataflow graphs is a global state model of computation. 
Because operators may be widely distributed physically and because firings may occur 
concurrently, it will in general be impossible actually to observe an executing graph performing a 
discrete state transition or even to determine the state of a graph. However, although firings may 
be concurrent, they may never conflict (because operator input links are disjoint), and for that 


reason token-pushing is a faithful representation of dataflow execution. 


in the succeeding chapters of this thesis, we will use token-pushing as a standard to 
measure other, more abstract, models. of non-determinate computation. We will only invoke 
token-pushing informally — to derive the result of executing a dataflow graph so that the actual 
result may be compared with those predicted by other models. Although formal properties of 
token-pushing will never be used, nonetheless an intuitive understanding of an operational 


semantics of parallel computation is essential for the appreciation of the more abstract ones. 
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and put operators is determinate, and consequently any network of such processes is also 
determinate. Kahn [22] used the fixed point methods of Scott [39, 41, 42] to define semantically 


the results of executing the networks generated with his programming language. 


Any determinate process (operator, network, or even graph) P may be modeled by its 
history function F{/P]. When X is the tuple of histories presented on the input ports of P, ¥[P](X) 


will be the tuple of histories produced at the output ports of P. 


In Chapter 2, we presented a network Sum whose n’th output was the sum of its first n 
inputs. In Kahn’s language Sum was defined as: 


network Sum in (IN) out (OUT) internal (X) 
X © cons(0, OUT) 
OUT ¢ plus(IN, X) 

end Sum 


The two component processes of Sum have the following very simple history functions: 


F[cons](A, Y) = A 
F[cons]](x X,Y) =x Y 


F[plus](A, Y) = A 
Ff[plus] (X, A) = A 
Ff[plus] (x X,y Y) = x+y Fplus] (Xx, Y) 
A is the empty history 
x, y are single values 
X, Y are arbitrary (possibly empty) sequences of values 


By replacing each process name within a network definition with its history function, a set of 


simultaneous equations is constructed: 


X = F[cons]](0, OUT) 
OUT = i[plus](IN, X) 


When the value of the input history IN is fixed, the least fixed point (solution) of the set of 
equations gives, as the value of OUT, the output history generated by executing Sum with input 


history IN. Thus may the history function of Sum be determined. 


Figure 3.1. Keller’s Example (with slight modification) 


of course, accept Keller's ultimate conclusion; however, we believe that his anomaly 
demonstrates at most that history relation interconnection rules which ignore causality fail. It 
does not show that the necessary causality relations cannot be inferred from history relations by, 
for example, requiring that later output of a merge does not “rewrite" earlier output.! Keller 
shows there are no easy interconnection rules for networks characterized by history relations. 


We show there are no interconnection rules. 


1. There are several subalgebras of data flow graphs which exhibit Keller's merge anomaly but 
which are amenable to analysis by history relations. A very simple one, consisting of only 
two-input two-output graphs, may be constructed using a single non-determinate dataflow 

operator computing plus1 ° first ° merge at both output ports (always the same value on both, 
that is, the output values are not separately computed) and a single graph interconnection rule of 
composing two graphs by placing them side-by-side and then oes the rightmost ports of 
the left graph to the leftmost ports of the right graph. 


Let S, and S, be the graphs shown in Figure 3.2. Syntactically, S, and S, may be written: 
S,(X, Y) = P,(merge(D(x), D(Y))) 
D, P,, and P, are all determinate processes which produce at most two output values. Process D 
produces two copies of its first input value. In Kahn’s [22] language, it may be written as: 


process D in (IN) out (OUT) 
x € get(IN) 
put x on OUT 
put x on OUT 

end D 


Both P, and P, allow their first two input values to pass through themselves as their first two 
output values. However, P, will produce its first output as soon as it receives its first input, while 
P, will not produce any output until it has received two input values. In Kahn’s language P, and 


P, may be written as: 


Figure 3.2. S, fork € {1, 2} 


process P, in (IN) out (OUT) 
x ¢ get(IN) 
put x on OUT 
y + get(IN) 
put y on OUT 
endP, 


process P, in (IN) out (OUT) 
x + get(IN) 
y + get(IN) 
put x on OUT 
put y on OUT 
end P, 


As history functions these three processes may be specified as: 
D(A) = A 
D(x Z) = x x 


P,(A) = A 
P,,(x) =X 
P,(x y Z)=xy 


P(A) = A 
P,{x) = A 
Pox y Ze xy 


Ais the empty history 

x and y are single values 

Z is an arbitrary (possibly empty) sequence of values 

Despite the difference between P, and P,, networks S, and S, have the same history 

relation representation. Neither network produces any output unless it receives some input. 
Suppose S, receives the input stream x X at its leftmost input port and no input at its rightmost 
port. Then the leftmost D process will produce the output history x x, while the rightmost D will 
produce nothing. The streams x x and the empty stream will be merged into the stream x x. 


Regardless of whether S, is S, or S,, process P, (P, or P,) will receive x x, two input values, and 


Figure 3.3. T, fork € {1, 2} 


operator, to the rightmost input port of S,. In Kahn’s language, this interconnection may be 
specified as: 
network T, in (IN) out (OUT) internal (X) 
X + times5(OUT) 
OUT ¢ S,(IN, X) 
end T, 

If history relations are an adequately detailed model of non-determinate dataflow 
computation, then networks T, and T, should have the same history relation as all their 
corresponding components do. However, this is not the case, as can be seen by simulating the 
execution of these two networks on the input history consisting of the single input one. In 


Figure 3.4 we have “removed the cover" from S, and have reproduced T, with the internal 


components of S, clearly shown. Let us now examine all possible computations of first T, and 
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Figure 3.4. T, fork € {1, 2} (exploded) 


= a7 & 


In Figure 3.10, the value-consistent pairs for T, are illustrated both with and without the 
merging of the connected ports. However, note that now only one of the merged value—consistent 
pairs is causality-consistent. In the other we see a cycle (between the one in S,'s output and the 
five in S,’s rightmost input). The scenario composition rule correctly reflects the fact that 1 1 is 


the only response of T, to the input history 1. 


In the beginning of this chapter, we mentioned faithfulness, abstraction, and simplicity as 
three goals of a semantic theory. With scenarios, faithfulness to the underlying operational model 
results from the manner in which the scenario causality relation reflects the causality implied by 
the firing sequences of the operational model. In Chapter5 we shall make a more formal 
assertion of this claim. We have shown that no semantic theory for non-determinate dataflow 
computation can be as abstract as history relations, the ae input/output behavioral 


specification for this class of computation. However, scenarios are not so far from this 


Figure 3.10. Value-consistent Pairs for T,(1) 


Ty 17171 Ty 1715 

S,: (1,55) > 11 S,: (1,525) > 15 

times5: 11-55 — times5: 15 > 5 25 
without port merging 


ie S25 eg. oo 


1 5 5 1 1. 5 
, 4 L{ 4 L 4 1 4 
5 1 1-5 25 5 5 ————>25 
with port merging 
wv’ o™ a a 
1 5 <—— 1 i. ee 
i 4 | {~~ 
5 <—— 1 25<—— 5 


unattainable goal. They are just history relations augmented with a notion of causality. The most 
striking advantage of the scenario model when compared to other proposed models of 
non-determinate dataflow computation must be the simplicity of its composition rule. Scenario 
composition does not require the solution of complicated fixed point equations over complicated 
mathematical domains. Scenarios can be composed with no more complicated mathematical 


machinery than the knowledge of partial orders. 


st ee 


Figure 4.2. Port relabeling [*4*] 


The third, and final, dataflow graph operation is port connection. Port connection [»— 9°], 
like port relabeling, is a unary operator and is also an operator schema. The port connection 
operator [a—,8] can be applied only to graphs G with an output port a and an input port 8. The 
graph G [a—,£], illustrated in Figure 4.3, is formed by connecting output port @ to input port f. 
The connected ports become internal! to the new graph and may never play any role in future 
graph interconnections. Thus /Inport(G [a—B]) = Inport(G) - {8}, and Outport(G [a-,B]) = 
Outport(G) - {a}. The connection of an output port to two or more input ports is accomplished 


by the explicit use of determinate fan-out operators. 


Obviously it is quite tedious to build up a graph with these operators. In Figure 4.4 a simple 
three-operator dataflow graph for computing the dot product of two two-element vectors, (x,, y,) 


and (x,, y,), is shown and described in our algebraic notation. We assume that each of the three 


1. "Restricted" in the Milne—Milner [30] terminology. 
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Figure 4.3. Port connection [*—,°] 


F [a8] 


dataflow operators has input ports labeled In, and In, and an output port labeled Out, and that 
each input port of the assembled graph should be labeled by the appropriate variable name and 
the graph output port should be labeled Resu/t. This rather straightforward interconnection 
requires no less than thirteen applications of our operators. However, since we are only 
concerned with developing a basis for the formalism of the succeeding chapter, the wordiness of 


our notation is of little concern. 


Figure 4.4. Two-Element Dot Product 


Result 


By 


( times [In 4 x,] [In, & x,] [Out, 4 Op,Out,] Ile 
times [In, & y,] Un, 4 y,] [Out, 4 Op,Out,] Il 
plus [/n, & Op,!n,] [In, 4 Op,in,] [Out, 4 Result] ) 

[Op,Out, 4 Op,in,] [Op,Out, 6 Op,/n,] 


In Figure 5.1, the scenario for the merge with input history tuple <5 6, 7> leading to the 
production of the output sequence 5 7 6 is illustrated. Assuming that the merge has input port 


labels /n, and /n, and output port label Out,, the scenario of Figure 5.1 is represented by the triple 


<E, V, C> where: 
E= {<in,, 1>, <in,, 2>, <In,, 1>, <Out,, 1>, <Out,, 2>, <Out,, 3>} 


V(<in,, 1>) = 5 V(<in, 1>)=7 V(<Out,, 1>) = 5 

V(<in, 2>) = 6 V(<Out,, 2>) = 7 
V(<Out,, 3>) = 6 

C is represented in Figure 5.2 as a Hasse diagram, the usual pictorial 

representation of a partial order. When this relation is enumerated as a subset 

of E X E, it contains sixteen ordered pairs. 


Figure 5.1. One Scenario for merge(5 6, 7) 


6 
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Figure 5.2. Hasse Diagram for a merge Causality Relation 


<iny, > ZG tn > 


<Iny > <Out, ’ 2> 


Figure 5.3. Antisymmetry: Cases 2 and 3 


Ym ——————>._ <6, n> 


\ Z 


<B, q> <a, q> 


Figure 5.4. Antisymmetry: Case 4 


<y, m> <6, n> 


<B, p> <a, p> 
<B, @> <a, q@> 


Now only the proof of transitivity remains to establish that C* is a partial order. 
Suppose <y, m>, <6, n>, and <e, o> are elements of E such that <y, m> C* <6, n> 
and <6, n> C* <e, o>. Once again, from the definition of C*, we have four cases, 
the product of the following two: (1), either <y, m> C <6, n> or there is an 
element <a, p> in E such that <y, m> C <a, p> and <B, p> C <6, m; and (2), 
either <5, n> C <e, o> or there is an element <a, g> in E such that <6, n> C <a, q> 
and <B, q>C <e, o>. 


If <y, m> C <6, n> and <6, n> C <e, o> then <y, m> C <e, o>, and thus 
<y, m> C* <e, o>, by the transitivity of C. 


Once more, we shall look at only one of the two remaining cases in which the 
events of one pair are directly related through C. Suppose <y, m> C <6, n> and 
there is an <a, q> in E such that <8, n> C <a, q> and <B, q> C <e, o>. This case 
is illustrated in Figure 5.5. As <y, m> C <6, n> and <8, n> C <a, q> then, by 
transitivity, <y, m> C <a, q> and, from the definition of C*, <y, m> C* <e, o>. 
Although not required for our proof, it’s worth noting that the restrictions on 
inter-port causality relations imposed by scenarios imply that 6 must be either y 
or a. 


For the last case of the last property of a partial order, assume that there exist 
events <a, p> and <a, q> such that <y, m> C <a, p> and <B, p> C <6, n>, and 
<6, n> C <a, g> and <B, g> C <e, o>. Furthermore, without loss of generality, 
assume that q is at least as great as p and, consequently, that <a, p> C <a, q> as 
shown in Figure 5.6. Then from <y, m> C <a, p>, it follows that <y, m> C <a, q> 
and, in turn, that <y, m> C* <e, o>. Again, it's worth noting that the inter-port 
causality relation restrictions force § to be either a or B. 


Having proven that C* is a reflexive, antisymmetric, and transitive relation, we 
have established that C* is a partial order. 


Figure 5.5. Transitivity: Cases 2 and 3 


<y, m> 


<8, n> 


<B, @> <a, @> 


<e, o> 
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Figure 5.6. Transitivity: Case 4 


<¥, m> 
<B, p> <a, p> 
<6, n> 
<B, a> <a, @> 


<e, o> 


The a-B connection relation is not only a partial order but also the smallest partial order 
which extends C and relates events at output port a to corresponding events at input port B. This 
fact is important to note as it makes our formal definition of scenario composition consistent with 


the informal one of Chapter 3. 


Given an a-B connection relation C* of a scenario set S, it is straightforward to construct a 
corresponding scenario for S [af] by "removing" the histories for a and B. The following 
theorem, an easy consequence of the preceding lemma, states more precisely how this is 


accomplished. 


G [a—,]. Thus there are firing sequences in G [a-»,f] for every scenario in S [a—,8]. With 


both cases satisfied, the operational faithfulness of scenario sets is established. 


The operational faithfulness of scenarios rests in their ability to incorporate physical 
cauSality in system representation. Firing sequences also represent this causality, but in doing so 
they include inter—port causalities other than those from input to output ports even though such 
causalities cannot be detected when dataflow graphs are connected through unbounded, 
time-independent communication channels. By omitting these unobservable causalities, we are 
able to develop an abstract, but nonetheless faithful, model of non-determinate dataflow — 


computation. 
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non-determinate computation the uiation is quite different: high-level constructs for the 
problems of this area have only recently begun to appear. Non-determinate dataflow languages 
seem to offer some advantages relative to non-determinate languages with a more conventional 
shared-memory multi-processing orientation; for example, with streams it is possible to write 
programs which exhibit state without side effects. However, through the merge anomaly we have 
already shown that the unconstrained interconnection of processes can result in unexpected 
semantic complexity even in the seemingly simple dataflow approach to inter-process 
communication. Maybe this particular complexity is unavoidable; maybe it is not. A semantic 


theory may not reveal how to avoid complexity, but at least it will reveal complexity. 
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